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Abstract 

This paper studies a fiilly Bayesian algorithm for endmember extraction and abundance estimation 
for hyperspectral imagery. Each pixel of the hyperspectral image is decomposed as a Unear combination 
of pure endmember spectra following the Unear mixing model. The estimation of the unknown end- 
member spectra is conducted in a unified manner by generating the posterior distribution of abundances 
and endmember parameters under a hierarchical Bayesian model. This model assumes conjugate prior 
distributions for these parameters, accounts for non-negativity and fuU-additivity constraints, and exploits 
the fact that the endmember proportions lie on a lower dimensional simplex. A Gibbs sampler is proposed 
to overcome the complexity of evaluating the resulting posterior distribution. This sampler generates 
samples distributed according to the posterior distribution and estimates the unknown parameters using 
these generated samples. The accuracy of the joint Bayesian estimator is illustrated by simulations 
conducted on synthetic and real AVIRIS images. 

Index Terms 

Hyperspectral imagery, endmember extraction, linear spectral unmixing, Bayesian inference, MCMC 
methods. 
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I. Introduction 

Over the last several decades, much research has been devoted to the spectral unmixing 
problem. Spectral unmixing is an efficient way to solve standard problems encountered in 
hyperspectral imagery. These problems include pixel classification [1], material quantification 
[2] and subpixel detection [3]. Spectral unmixing consists of decomposing a pixel spectrum 
into a collection of material spectra, usually referred to as endmembers, and estimating the 
corresponding proportions or abundances [4]. To describe the mixture, the most frequently 
encountered model is the macroscopic model which gives a good approximation in the reflective 
spectral domain ranging from 0.4yum to 2.5/im [5]. The linearization of the non-linear intimate 
model proposed by Hapke in [6] results in this macroscopic model [7]. The macroscopic model 
assumes that the observed pixel spectrum is a weighted linear combination of the endmember 
spectra. 

As reported in [4], linear spectral mixture analysis (LSMA) has often been handled as a two 
step procedure: the endmember extraction step and the inversion step, respectively. In the first 
step of analysis, the macroscopic materials that are present in the observed scene are identified 
by using an Endmember Extraction Algorithm (EEA). The most popular EEAs include PPI [8] 
and N-FINDR [9], that apply a linear model for the observations with non-negativity and fuU- 
additivit)!^ constraints. This model results in endmember spectra located on the vertices of a 
lower dimensional simplex. PPI and N-FINDR estimate this simplex by identifying the largest 
simplex contained in the data. Another popular alternative, called Vertex Component Analysis 
(VGA) has been proposed in [10]. A common assumption in VGA, PPI and N-FINDR is that they 
require pure pixels to be present in the observed scene, where pure pixels are pixels composed of 
a single endmember. Alternatively, Graig has proposed the Minimum Volume Transform (MVT) 
to find the smallest simplex that contains all the pixels [11]. However, MVT-based methods (e.g. 
ORASIS [12]) are not fully automated techniques: they provide results that strongly depend on 
i) the algorithm initialization, ii) some ad hoc parameters that have to be selected by the user. 
More generally, these previous EEAs avoid the difficult problem of direct parameter estimation 
on the simplex. The interested reader is invited to consult [13] and [14] for a recent performance 
comparison of some standard EEAs. 

'The full-additivity constraint, that will be detailed in the following section, refers to a unit ^i-norm. 
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The second step in LSMA, called the inversion step, consists of estimating the proportions of 
the materials identified by EEA [15]. The inversion step can use various strategies such as least 
square estimation [16], maximum likeUhood estimation [17] and Bayesian estimation [18]. 

The central premise of this paper is to propose an algorithm that estimates the endmember 
spectra and their respective abundances jointly in a single step. This approach casts LSMA 
as a blind source separation (BSS) problem [19]. In numerous fields, independent component 
analysis (ICA) [20] has been a mainstay approach to solve BSS problems. In hyperspectral 
imagery, ICA has also been envisaged [21]. However, as illustrated in [15] and [22], ICA may 
perform poorly for LSMA due to the strong dependence between the different abundances [23]. 
Inspired by ICA, dependent component analysis has been introduced in [24] to exploit this 
dependance. However, this approach assumes that the hyperspectral observations are noise-free. 
Alternatively, non-negative matrix factorization (NMF) [25] can also be used to solve BSS 
problem under non-negativity constraints. In [26], a NMF algorithm that consists of alternately 
updating the signature and abundance matrices has been successfully applied to identify con- 
stituent in chemical shift imaging. In this work, the additivity constraint has not been taken into 
account. Basic simulations conducted on synthetic images show that such MNF strategies lead 
to weak estimation performances. In [27], an iterative algorithm called ICE (iterated constrained 
endmembers) is proposed to minimize a penalized criterion. As noted in [24], results provided 
by ICE strongly depend on the choice of the algorithm parameters. More recently, in [28], Miao 
et al. have proposed another iterated optimization scheme performing NMF with an additivity 
constraint on the abundance coefficients. However, as this constraint has been included in the 
objective function, it is not necessarily ensured. In addition the performances of the algorithm 
in [28] decrease significantly when the noise level increases. 

The joint Bayesian model uses a Gibbs sampling algorithm to efficiently solve the constrained 
spectral unmixing problem without requiring the presence of pure pixels in the hyperspectral 
image. In addition, to our knowledge, this is the first time that non-negativity constraints for 
endmember spectra as well as hard additivity and non-negativity constraints for the abundances 
are jointly considered in hyperspectral imagery. 

In many works, Bayesian estimation approaches have been adopted to solve BSS problems 
(see for example [29]) Hke LSMA. The Bayesian formulation allows one to directly incorporate 
constraints into the model such as sparsity [30], non-negativity [31] and full additivity (sum- 
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to-one constraint) [32]. In this paper, prior distributions are proposed for the abundances and 
endmember spectra to enforce the constraints inherent to the hyperspectral mixing model. These 
constraints include non-negativity and fuU-additivity of the abundance coefficients (as in [18]) 
and non-negativity of the endmember spectra. The proposed joint LSMA approach is able 
to solve the endmember spectrum estimation problem directly on a lower dimensional space 
within a Bayesian framework. We believe that this is one of the principal factors leading to 



performance improvements that we show on simulated and real data in Sections |V] and |VIj By 
estimating the parameters on the lower dimensional space we effectively reduce the number of 
degrees of freedom of the parameters relative to other methods (e.g. [31]), translating into lower 
estimator bias and variance. The problem of hyperparameter selection in our Bayesian model is 
circumvented by adopting the hierarchical Bayesian approach of [18] that produces a parameter- 
independent Bayesian posterior distribution for the endmember spectra and abundances. To 
overcome the complexity of the full posterior distribution, a Gibbs sampling strategy is derived 
to approximate standard Bayesian estimators, e.g. the minimum mean squared error (MMSE) 
estimator. Moreover, as the full posterior distribution of all the unknown parameters is available, 
confidence interval can be easily computed. These measures allow one to quantify the accuracy 
of the different estimates. 

The paper is organized as follows. The observation model is described in Section |Ilj The 
different quantities necessary to the Bayesian formulation are enumerated in Section [nij Sec- 
tion llV] presents the proposed Gibbs sampler for joint abundance and endmember estimation. 



Simulation results obtained with synthetic and real AVIRIS data are reported in Sections |V] 
and VI respectively. Section |VII concludes the paper. An appendix provides details on our 
parameterization of the simplex and selecting relevant and tractable priors. 



II. Linear mixing model and problem statement 

Consider P pixels of an hyperspectral image acquired in L spectral bands. According to the 
linear mixing model (LMM), described for instance in [4], the L-spectrum jp = [ypA, . . . , 1/p,l]^ 
of the pth pixel (p = 1, . . . , P) is assumed to be a linear combination of R spectra m,. corrupted 
by an additive Gaussian noise 

R 

Yp = ^ ap,r + rip, (1) 
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where nir = [^r,!, • • • , "^r.i]"^ denotes the spectrum of the rth material, ^ is the fraction 
of the rth material in the pth observation, R is the number of materials, L is the number 
of available spectral bands and P is the number of observations (pixels). Moreover, in ([T|), 
Hp = [rip 1, . . . , ^^p,L]'^ is an additive noise sequence which is assumed to be an independent and 
identically distributed (i.i.d.) zero-mean Gaussian sequence with covariance matrix Sn = ct^Il, 
where II is the identity matrix of dimension L x L 

np~Ar(OL,S„). (2) 

The proposed model in (|2]) does not account for any possible correlation in the noise sequences 
but has been widely adopted in the literature [33]-[35]. However, some simulation results reported 
in paragraph |V-D will show that the proposed algorithm is robust to the violation of the i.i.d. 
noise assumption. Note finally that the model in ([T]) can be easily modified (see [36]) to handle 
more complicated noise models with different variances in each spectral band as in [37], or by 
taking into account correlations between spectral bands as in [18]. 

Due to physical considerations, described in [3], [18] or [38], the fraction vectors ap = 
[op 1, . . . , Op r]^ in ([T]) satisfy the following non-negativity and fuU-additivity (or sum-to-one) 
constraints: 

ap,r > 0, Vr = 

In other words, the p abundance vectors belong to the space 

^ = {a: ||a||^ = 1 and a ^ 0} , (4) 

where \\-\\^ is the ii norm ||x||^ = a ^ stands for the set of inequalities 

{ttr > 0}r=i R- Moreover, the endmember spectra component m^,; must satisfy non-negativity 
constraints: 

mr,i > 0, Vr = 1, . . . , i?, V/ = 1, . . . , L. (5) 
Considering all pixels, standard matrix notations yield: 

Y = MA + N, (6) 

where 

Y = [yi,...,yp], M = [rrii, . . . , m^] , 
A = [ai, . . . ,ap] , N = [ni, . . . , np] . 
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In this work, we propose to estimate A and M from the noisy observations Y under the 
constraints in (|3]) and Note that the unconstrained BSS problem for estimating M and A from 
Y is ill-posed: if {Y, A} is an admissible estimate then {YH,H^A} is also admissible for any 
unitary matrix H. In the LSMA problem, this non-uniqueness can be partially circumvented by 
additional constraints such as fuU-additivity, which enables one to handle the scale indeterminacy. 
Consequently, these unit £i-norm constraints on the abundance vectors avoid using more complex 
strategies for direct estimation of the scale [39]. Despite the constraints in ([3]) and (|5]), uniqueness 
of the couple {M, A} solution of the LSMA (|6]) is not systematically ensured. To illustrate this 
problem, 50 admissible solution^ have been depicted in Fig. [T] for i? = 2 endmembers involved 
in the mixing of P = 2500 pixels [40]. In the following section, the Bayesian model used for 
the LSMA is presented. 



1 




Wavelength (^jm) Wavelength (pm) 

Fig. 1. Range of admissible solution for two endmember spectra : construction concrete (left) and red brick (right). The actual 
endmember (red lines) are mixed according ([TJ under the constraints in ([3} with random proportions to obtain P = 2500 pixels. 
50 admissible solutions (blue lines) of the BSS problems in |6]l are generated using [40]. 

III. Bayesian model 

A. Likelihood 

The linear mixing model defined in ([T]) and the statistical properties in ([2]) of the noise vector 
result in a conditionally Gaussian distribution for the observation of the pth pixel: yp|M, ap, cr^ ~ 

^Admissible solutions refer to couples {M, A} that satisfy ([S} and l|5j and that follow the model ([T]l in the noise-free case. 
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A/" (Map, (t^Il). Therefore, the likelihood function of yp can be expressed as 

IIYp - Map II 



/ (yp|M, ap, a^) = ( ) exp 



2a2 



(8) 



where ||x|| = (x-^x) ^ is the £2 norm. Assuming the independence between the noise sequences 

Hp (jo = 1, ... , P), the likelihood function of all the observations Y is: 

p 

/(Y|M,A,a2) = J]/(yp|M,ap,a2). (9) 

p=i 

B. Prior model for the endmember spectra 

1 ) Dimensionality reduction: It is interesting to note that the unobserved matrix X = MA = 
Y — N is rank deficient under the linear model ([T|). More precisely, the set 

{R R 
X G R^; X = ^ A,.m^, ^ = 1, A^ > > (10) 
r=l r=l J 

is a (i? — 1) -dimensional convex polytope of R^ whose vertices are the R endmember spectra 
rrij. (r = 1, . . . , i?) to be recovered. Consequently, in the noise-free case, X can be represented 
in a suitable lower-dimensional subset Vk of R^ without lost of information. To illustrate this 
property, P = 1000 pixels resulting from a noise-free mixture of i? = 3 endmembers are 
represented in Fig. [2] As noted in [4], this dimensionality reduction is a common step of the 
LSMA, adopted by numerous EEAs, such as N-FINDR [9] or PPI [8]. Similarly, we propose to 
estimate the projection (r = 1, . . . , i?) of the endmember spectra 111^ in the subspace Vk- The 
identification of this subspace can be achieved via a standard dimension reduction procedure. In 
the sequel, we propose to define Vk as the subspace spanned by K orthogonal axes vi, . . . , vk 
identified by a principal component analysis (PC A) on the observations Y [41]: 

Vic = span(vi, . . . , Vk) . (11) 

The first two principal axes are identified in Fig. [2] for the synthetic hyperspectral data. In 
the following paragraph, the PCA is described. Note however that this PCA-based dimension 
reduction step can be easily replaced by other projection techniques, such as the maximum noise 



fraction (MNF) transform [42] that has been considered in paragraph V-D 
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Fig. 2. Example of hyperspectral data observed in 3 spectral bands. The mixed pixels (blue points) belong to the f?-dimensional 
convex polytope 5m (red lines) whose vertices are the endmembers spectra mi, . . . ,mR (red stars). The first two principal 
axes estimated by a PCA appear in dashed lines and define the projection subset Vk- 



2) PCA projection: The L x L empirical covariance matrix T of the data Y is given by: 

p 



T 

where y is the empirical mean: 
Let 



p 



y = ^J2yp- (13) 



p=i 



D = diag(Ai, ...,Xk), 

(14) 

V = [Vi,. . .,VKf 

denote respectively the diagonal matrix of the K highest eigenvalues and the corresponding 
eigenvector matrix of T. The PCA projection t.,. G of the endmember spectrum G 
is obtained as follows: 

= P (m, - y) , (15) 
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with P = D aV. Equivalently, 

IXlr = Utr + y, (16) 

with U = V^^D^ where V^^ = (V^V) ^ is the pseudo-inverse of V. Note that in the 
subspace Vr-i obtained for K = R — 1, the vectors {tr}j.^i ^ form a simplex that standard 
EEAs such as N-FINDR [9], MVT [11] and ICE [27] try to recover. In this paper, we estimate 
the vertices (r = 1, . . . , i?) of this simplex using a Bayesian approach. The Bayesian prior 
distributions for the projections (r = 1, . . . , i?) are introduced in the following paragraph. 

3) Prior distribution for the projected spectra: All the elements of the subspace Vk may not 
be appropriate projected spectra according to ( [T5] ). Indeed, the K x 1 vector has to ensure 
non-negativity constraints ([5]) of the corresponding reconstructed L x 1 spectra rrir. For each 
endmember rrif , straightforward computations establish that for any r = 1, . . . , i? 

{mi^r > 0, V/ = 1, . . . , L} ^ {t, e %.} , (17) 

where the set % C Vk is defined by the following L inequalities 

% = jt^; l/« + 5^ ui^ktk,r > 0, / = 1, . . . , l| , (18) 

with y = [l/i, . . . and U = [ui^k]- A conjugate]^ multivariate Gaussian distribution (MGD) 

A/7; (e^, s^Ia') truncated on the set %. is chosen as prior distribution for t,.. The probability 
density function (pdf) (■) of this truncated MGD is defined by: 

(tr|er, s^Ik) oc (t^je^, s^Ik) 1% (tr) , (19) 

where oc stands for "proportional to", 0(-|u, W) is the pdf of the standard MGD A/'(u, W) 
with mean vector u and covariance matrix W, and Ir^ (■) is the indicator function on the set 



, 1, if X G 7; : 
lr.(x) = <( (20) 
0, overwise. 



The normalizing constant Kr^ (e,., s^I^:) in ( [191 ) is defined as follows: 



Kr, {er, s'^Ik) = / (x|e^, s^Ij^) dx. (21) 

JT-r 



^For the main motivations of clioosing conjugate priors, see for instance [43, Chap. 3]. 
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This paper proposes to choose the mean vectors e,. (r = 1, . . . ,R) in ([19]) as the projected 
spectra of pure components previously identified by EEA, e.g., N-FINDR. The variances 
(r = 1, . . . , R) reflect the degree of confidence given to this prior information. When no additional 



knowledge is available, these variances are fixed to large values: sf 



50. 



By assuming a priori independence of the vectors tr {r = 1, . . . , R), the prior distribution for 
the projected endmember matrix T = [ti, . . . , t^] is 

R 

/ (T I E, s2) = J] (f)r^ (t,|e,,, s^Ik) , (22) 



r=l 



where E = [ei, . . . , e^j] and = [si, . . . ,s 



I]- 



C. Abundance prior 

For each observed pixel p, with the full additivity constraint in (|3]), the abundance vectors 
(p = 1, . . . , P) can be rewritten as 



0'P,R 



with 



O'p^R-l 



and ap^R = 1 — Yl,f=i (^p,r- Following the model in [18], the priors chosen for Cp (p = 1, . . . , P) 
are uniform distributions on the simplex S defined by: 

S = {cp; ||Cp||-^ < 1 and Cp ^ O} . (23) 



Choosing this prior distribution for Cp {p 



P) is equivalent to electing a Dirichlet 



distribution I) (1, ... , 1), i.e a uniform distribution on A defined in (|4]), as prior distribution 
for the full abundance vector ap [43, Appendix A]. However, the proposed reparametrization 



will prove to be well adapted to the Gibbs sampling strategy introduced in Section IV 



Under the assumption of statistical independence between the abundance vectors Cp {p = 
1, . . . , P), the full prior distribution for partial abundance matrix C = [ci, . . . , cp] can be 
written 

p 

f{C)^l[ls{cp). (24) 

p=i 

As noted in [18], the uniform prior distribution reflects the lack of a priori knowledge about 
the abundance vector. Moreover, for the BSS problem here, this imposes a strong constraint on 
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the size of the simplex to be recovered. As demonstrated in the Appendix, among two a priori 
equiprobable solutions of the BSS problem, this uniform prior allows one to favor a posteriori 
the solution corresponding to the polytope in the projection subset Vk with smallest volume. 
This property has also been exploited in [11]. 

D. Noise variance prior 

A conjugate prior is chosen for cr^: 

a2|j.,7~J^(^,|), (25) 

where XQ (|, |) denotes the inverse-gamma distribution with parameters | and |. As in previous 
works ([44] and [45]), the hyperparameter u will be fixed to v = 2. On the other hand, 7 will 
be a random and adjustable hyperparameter, whose prior distribution is defined below. 



E. Prior distribution for hyperparameter 7 

The prior for 7 is a non-informative Jeffreys' prior [46] which reflects the lack of knowledge 
regarding this hyperparameter: 

/ (7) oc (7) • (26) 

7 

F. Posterior distribution 

The posterior distribution of the unknown parameter vector 6 = {C, T, cr^} can be computed 
from marginalization using the following hierarchical structure 

f{e\Y) = J /(0,7|Y)c?7cx J /(Y|0)/(0|7)/(7)rf7, (27) 

where / (Y|0) and / (7) are defined in (|9]) and (|26]) respectively. Moreover, under the assumption 
of a priori independence between C, T and cr^, the following result can be obtained: 

/ (0|7) = / (C) / (T I E, s^) / {a' I u, 7) , (28) 



where / (C | E.s^), / (T) and / (a^ | z/,7) have been defined in Eq.'s (|24]), (|22]) and ([25]), 
respectively. 
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This hierarchical structure allows one to integrate out the hyperparameter 7 from the joint 
distribution f(0,'y\Y), yielding: 

p 

f{C,T,a'\Y)^l[ls M 



R 

X Y\ exp 

r=l 
P 



p=l 



2s2 



(tr) 



(29) 



X 



n 



r+l 



exp 



fUT 



Deriving the Bayesian estimators (e.g., MMSE or MAP) from the posterior distribution in (|29]) 
remains intractable. In such case, it is very common to use Markov chain Monte Carlo (MCMC) 
methods to generate samples asymptotically distributed according to the posterior distribution. 
The Bayesian estimators can then be approximated using these samples. The next section studies 



a Gibbs sampling strategy allowing one to generate samples distributed according to (29 1 



IV. Gibbs sampler 

Random samples (denoted by -^^^ where t is the iteration index) can be drawn from / (C, T, | Y) 
using a Gibbs sampler [47]. This MCMC technique consists of generating samples {C*^*-', T'^*\ cr^^*)} 
distributed according to the conditional posterior distributions of each parameter. 



A. Sampling from f (C|T, cr^, Y) 

Straightforward computations yield for each observation: 

/(cp|T,a2,y,) 



oc exp 



IC„ - Vr,]' {C„ - Vr 



where: 



l5(cp), (30) 



(31) 



with S^^ = = [1, . . . , 1]^ G M^"^ and where M.r denotes the matrix M whose i?th 

column has been removed. As a consequence, Cp|T,cr^,yp is distributed according to an MGD 
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Algorithm 1: 
Gibbs sampling algorithm for LSMA 

• Preprocessing: 

- Compute the empirical mean vector y in ( [T3| ), 

- Compute the matrices D and V in ( [T4) l via a PCA, 

- SetU= (V^V)"'V^D5, 

- For r = 1 , . . . , i?, choose the a priori estimated endmembers £ Vk in ( fT9l l, 

• Initialization: 

- For r — 1, . . . , i?, sample t^r^ from ( [T9] l, 

- For r = 1, . . . , i?, set m^°^ = Ut[°^ + y, 

- Sample cr2(o) fj-om (|25]), 

- Set i <- 1, 

• Iterations: for t = 1, 2, . . . , do 

1 . For p = I, . . . , P, sample Cp*' from ( [32] l, 

2. For r = 1, .... i?, for A: = 1, . . . , iiT, sample from ( [37] i, 

3. For r = 1, . . . , i?, set m^.*^ = Utf°^ + y, 

4. Sample cr^C*) from (|39]|. 

5. Sett^t + 1. 



truncated on the simplex S in ( [23] ): 

Cp\T,a^,yp r^^^sivp,l:p). (32) 

Note that samples can be drawn from an MGD truncated on a simplex using efficient Monte 
Carlo simulation strategies such as described in [48]. 
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B. Sampling from / (T|C, a^, Y) 

Define T.^ as the matrix T wliose rth column has been removed. Then the conditional posterior 
distribution of (r = 1, . . . , i?) is: 



/ (t^|T.^,c^,a^Y) cx 



exp 



2 "^t) (^r "^r 



H (tr) , 



(33) 



with 



A. 



Tr = Ar 



J2 «?,rU^S-^ep,, + — e 



p=i 



(34) 



and 



^P,j^j- 



(35) 



Note that nij = Utj + y. As a consequence, the posterior distribution of is the following 
truncated MGD: 



I T.,., Cr, a^ Y ~ TVt;. {Tr, Ar) ■ 



(36) 



Generating vectors distributed according to this distribution is a difficult task, mainly due to 
the truncation on the subset %. An alternative consists of generating each component tk^r of t,. 
conditionally upon the others t.^ = {tj.rj^y^. More precisely, by denoting U/^ = {I; ui^k > 0}, 

l^k = < 0} and ei^k,r = Vi + Y.j^k ^i,j^j,r^ one can write 



with 



ifc.rlt-fc.r, T.J., Cr, (T^, Y ~ ATr,- ,+ 1 (Wk,r,zl^) 

I k,r'' k,rj ' 



(37) 



tj^^^ = max 



q = min 



(38) 



leu,: ui^k 

and where Wk,r and zl ^. are the conditional mean and variance respectively, derived from the 
partitioned mean vector and covariance matrix [49, p. 324] (see [48] for similar computations). 
Generating samples distributed according to the two-sided truncated Gaussian distribution in ( |37| ) 
can be easily achieved with the algorithm described in [50]. 
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Fig. 3. Actual endmembers (black lines), endmembers estimated by N-FINDR (blue lines), endmembers estimated by VCA 
(green lines) and endmembers estimated by proposed approach (red lines). 



C. Sampling from f (cr^ | C, T, Y) 

The conditional distribution of (t^|C,T, Y is the following inverse Gamma distribution: 

a'\C, T,Y r^Xg (^^, Ijljyp- M^^'ll') • (39) 

Simulating according to this inverse Gamma distribution can be achieved using a Gamma variate 
generator (see [51, Ch. 9] and [43, Appendix A]). 

To summarize, the hyperparameters that have to be fixed at the beginning of the algorithm are 
the following: u = 2, = . . . = sj^ = 50 and {e^}^^]^ are set to projected spectra identified 
by a standard EEA (e.g., N-FINDR). 

V. Simulations on synthetic data 

To illustrate the accuracy of the proposed algorithm, simulations are conducted on a 100 x 100 
synthetic image. This hyperspectral image is composed of three different regions with R = 3 
pure materials representative of a sub-urban scene: construction concrete, green grass and red 
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TABLE I 

Abundance means and variances of each endmember in each region of the 100 x 100 hyperspectral image. 





Region #1 


Region #2 


Region #3 


mean 


var. 


mean 


var. 


mean 


var. 


Endm. #1 


0.60 


0.01 


0.25 


0.01 


0.25 


0.02 


Endm. #2 


0.20 


0.02 


0.50 


0.01 


0.15 


0.005 


Endm. #3 


0.20 


0.01 


0.25 


0.02 


0.60 


0.02 



brick. The spectra of these endmembers have been extracted from the spectral libraries distributed 
with the ENVI software [52] and are represented in Fig. [3] (top, black lines). The reflectances are 
observed in L = 413 spectral bands ranging from OAfim to 2.5/im. These R = 3 components 
have been mixed with proportions that have been randomly generated according to truncated 
MGDs reflecting the means and variances reported in Table |I} The generated abundance maps 
have been depicted in Fig. |5] (top) in gray scale where a white (resp. black) pixel stands for the 
presence (resp. absence) of the material. The signal-to-noise ratio has been tuned to SNRdB = 
15dB. 



A. Endmember spectrum estimation 

The resulting hyperspectral data have been unmixed by the proposed algorithm. First, the 
space Vk in ( fTT] ) has been identified by PCA as discussed in paragraph III-B.2 The hidden 



mean vectors (r = 1, . . . , i?) of the normal distributions in ( [19] ) have been chosen as the PCA 
projections of endmembers previously identified by N-FINDR. The hidden variances have 
all been chosen equal to = . . . = s|. = 50 to obtain vague priors (i.e. large variances). The 
Gibbs sampler has been run with A^mc = 1300 iterations, including A'bi = 300 bum-in iterations. 
The MMSE estimates of the abundance vectors ap {p = 1, . . . , P) and the projected spectra t.^ 
(r = 1, . . . , i?) have been approximated by computing empirical averages over the last computed 
outputs of the sampler \ a^*^ \ and < t^*-* \ , following the MMSE principle: 

I J t=l,...,NMC I- J t=l,...,NMC 

Xmmse = E [x|y] 



1 ^MC (40) 

1 — ^ 



March 17, 2009 



DRAFT 



17 



4r 




3 1 1 1 1 1 1 1 

-3-2-1 1 2 3 

Band #1 



Fig. 4. Scatter plot in the lower-dimensional space V2' projected dataset (black points), actual endmembers (black circles), 
endmembers estimated by N-FINDR (blue stars), endmembers estimated by VGA (green stars) and endmembers estimated by 
proposed approach (red stars). All pixel spectra do not lie inside ground truth simplex due to simulated measurement noise. 

The corresponding endmember spectra estimated by the proposed algorithm are depicted in 
Fig. [3] (top, red lines). The proposed algorithm clearly outperforms N-FINDR and VGA, as 
shown in Fig. |3| The scatter plot in Fig. |4] provides additional insight. The N-FINDR and VGA 
algorithms assume the presence of pure pixels in the data. However, as none of these pixels are 
pure, N-FINDR and VGA provide poorer results than the proposed joint Bayesian algorithm. To 
illustrate this point, the performances of the different algorithms have been compared via two 
criteria. First, the mean square errors (MSEs) 

MSE^ = ||m^-m^f , r = l,...,R (41) 

are good quality indicators for the estimates. In addition, another metric frequently encountered 
in hyperspectral imagery literature, known as the spectral angle distance (SAD), has been 
considered. The SAD measures the angle between the actual and the corresponding estimated 
spectrum: 

SAD, = arccos ( /"^^'^'M , (42) 
V||m,.|| ||m,.||y 
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where (■, •) stands for the scalar product. These performance criteria computed for the endmember 



spectra estimated by the different algorithm have been reported in Table III They show that the 
proposed method performs significantly better than the others. The computation times required 
by each of these algorithms are reported in Table |ll] for a unoptimized MATLAB 2007b 32bit 
implementation on a 2.2GHz Intel Core 2. Obviously, the complexity of the VCA and N-FINDR 
methods are lower than the proposed approach. Note however that, contrary to the joint Bayesian 
procedure, these standard EEA have to be coupled with an abundance estimation algorithm. 
Moreover they only provide point estimates of the endmember spectra. Note finally that the 
computational complexity of N-FINDR, because it is combinatorial, increases drastically with 
the number of pixels and endmembers. 



TABLE II 

Computational times of VCA, N-FINDR and the proposed Bayesian method for unmixing P = 32 x 32 

PIXELS. 





Bayesian 


VCA 


N-FINDR 


Times (s) 


3250 


1 


23 



B. Abundance estimation 

The MMSE estimates of the abundance vectors for the P = 10^ pixels of the image have 



Nmc 



been computed following the MMSE principle in ( [40| ) 

^ Nmc 

The corresponding estimated abundance maps are depicted in Fig. |5] (bottom) and are clearly in 
good agreement with the simulated maps (top). 

Note that the proposed Bayesian estimation provides the joint posterior distribution of the 
unknown parameters. Specifically, these posteriors allow one to derive confidence intervals 
regarding the parameters of interest. For instance, the posterior distributions of the abundance 
coefficients is depicted in Fig. [6] for the pixel #100. Note that these estimated posteriors are in 
good agreement with the actual values of aioo depicted in red dotted lines. 
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TABLE m 

Performance comparison between VCA, N-FINDR and the proposed Bayesian method: MSE^ and SAD 
(x 10~^) between the actual and the estimated endmember spectra. 







End. 


Bayesian 


VCA 


N-FINDR 






MSE^ 


SAM 


MSE^ 


SAM 


MSE^ 


SAM 




CO 


#1 


1.70 


0.63 


17.80 


1.91 


2.69 


0.75 




II 


#2 


6.56 


1.49 


10.87 


1.80 


10.87 


1.80 


= 5dB 




#3 


2.70 


0.59 


12.71 


1.40 


2.94 


0.60 




#1 


0.70 


0.01 


64.47 


3.93 


49.16 


3.46 


SNR : 




#2 


1.05 


0.49 


47.21 


3.13 


15.46 


2.07 




II 


#1 
#i 


1.04 


0.49 


17.11 


1.81 


17.11 


1.81 






#4 


1.05 


0.49 


13.43 


1.23 


6.92 


0.79 






#3 


1.04 


0.49 


16.12 


1.45 


4.92 


0.88 




CO 


#1 


0.10 


0.15 


1.29 


0.48 


0.54 


0.33 




II 


#2 


2.68 


0.92 


15.59 


2.12 


5.19 


1.26 


m 




#3 


0.16 


0.12 


4.35 


0.71 


0.57 


0.22 


= 15d 




44- 1 
#1 


0.12 


0.17 


0.70 


0.40 


0.70 


0.40 


SNR: 


LO 


#2 


0.97 


0.52 


11.34 


1.44 


10.57 


1.68 




II 


#3 


0.26 


0.22 


4.07 


0.72 


0.43 


0.25 






#4 


0.40 


0.15 


2.36 


0.44 


7.67 


0.47 






#3 


0.24 


0.18 


1.54 


0.43 


2.93 


0.60 




CO 


#1 


0.05 


0.09 


1.14 


0.52 


1.14 


0.52 




1 


#2 


2.19 


0.83 


5.65 


1.33 


5.65 


1.33 


m 




#3 


0.17 


0.14 


0.66 


0.22 


0.66 


0.22 


= 25d 




#1 


0.42 


0.29 


7.62 


1.32 


7.62 


1.32 


SNR: 




#2 


0.37 


0.34 


27.16 


2.23 


20.66 


2.40 




II 


#3 


0.46 


0.29 


6.75 


1.10 


2.26 


0.65 






#4 


0.07 


0.09 


10.02 


0.93 


10.88 


0.70 






#5 


0.35 


0.20 


3.73 


0.61 


3.71 


0.62 
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Fig. 5. Top: actual endmember abundance maps. Bottom: estimated endmember abundance maps. 
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Fig. 6. Posterior distributions of ap^r (r — 1, . . . , 3). The actual values are depicted in red dotted lines. 



These results have been compared with estimates provided by the N-FINDR or VGA algo- 
rithms, coupled with an abundance estimation procedure based on the Fully Constrained Least- 
Squares (FCLS) approach proposed by Heinz et al. [16]. The global abundance MSEs have been 
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computed following: 



GMSE' 



p=i 



a. 



'p,r 



(44) 



where is the estimated abundance coefficient of the material #r in the pixel #p. These 



perf"ormance measures have been reported in Table IV and confirm the accuracy of the proposed 
Bayesian estimation method. Moreover, note that neither N-FINDR nor VGA are able to provide 
confidence measures as those depicted in Fig |6] 

TABLE IV 

Performance comparison between VCA, N-FINDR and the proposed Bayesian method: GMSE^ between the 

actual and the estimated abundance maps. 





Bayesian 


VCA 


N-FINDR 


Endm. #1 


25.68 


57.43 


30.66 


Endm. #2 


29.97 


74.48 


46.45 


Endm. #3 


3.19 


83.02 


11.22 



C. Other simulation scenarios 



Simulations have also been conducted with different noise levels (SNRdB = 5dB, 25dB) and 
when i? = 3 or i? = 5 endmembers are involved in the mixture. Estimation performances for 
the VCA and N-FINDR algorithms, as well as the proposed approach, have been summarized 



in Table III These results expressed in terms of MSE and SAD corroborate the effectiveness of 
our Bayesian estimation procedure. 



D. Robustness to non-i.i.d noise models 

In this paragraph, we illustrate the robustness of the proposed algorithm with respect to 
violation of the i.i.d. noise assumption. More precisely, a so-called Gaussian shaped noise inspired 
by [53] has been considered. The noise correlation matrix Sn = diag (a^, . . . , cr|) is designed 
such that its diagonal elements af (l = 1, . . . , L) follow a Gaussian shape centered at band L/2 

{l-L/2f 



cr, 



a exp 



(45) 
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TABLE V 

Performance comparison between VCA, N-FINDR and the proposed Bayesian method in presence of 
Gaussian shaped noise: MSE^ and SAD (xlO^^) between the actual and the estimated endmember spectra. 





MNF + Bayesian 


HySime 


+ Bayesian 


VCA 


N-FINDR 




MSE^ 


SAD 


MSE^ 


SAD 


MSE^ 


SAD 


MSE^ 


SAD 


Endm. #1 


0.26 


0.25 


0.42 


0.31 


1.11 


0.46 


1.11 


0.46 


Endm. #2 


1.99 


0.79 


4.35 


1.16 


5.78 


1.33 


5.78 


1.33 


Endm. #3 


0.33 


0.19 


0.57 


0.22 


1.94 


0.41 


2.19 


0.43 



The parameter cr^ can be tuned to choose the SNR whereas the parameter 77 adjusts the shape 
width (i] ^ 00 corresponds to i.i.d. noise). For this simulation, the parameters cr^ and r] have 
been fixed to 1.0 x 10^ and 50 respectively, leading to a noise level of SNRdB = 15dB. 

When the noise is not i.i.d., dimensionality reduction methods based on eigen-decomposition 



of observed data correlation matrix T introduced in (12) can be inefficient. In this case, other 



hyperspectral subspace identification methods have to be considered. Here the PCA-based di- 



mension reduction step introduced in paragraph III-B.2 was replaced by two techniques: the 
well-known MNF transform [42] approach and the more recently introduced HySime algorithm 
[53]. Both of them requires noise covariance matrix Sn estimation, which was implemented 
following [53]. The estimation performances for the proposed Bayesian estimation procedure 
coupled with MNF or HySime are reported in Table |V] and compared with VCA and N-FINDR. 
These results show that the proposed method i) can be easily used with other dimension reduction 
procedures, and ii) is quite robust to the i.i.d. noise assumption. 

VI. Real data 

This section illustrates the proposed algorithm on real hyperspectral data. The considered 
hyperspectral image was acquired over Moffett Field (CA, USA) in 1997 by the JPL spectro- 
imager AVIRIS [54]. This image has been used in many works to illustrate hyperspectral signal 
processing algorithms [55], [56]. 

A 50 X 50 sub-image depicted in Fig. |7] (right) has been unmixed using the proposed Bayesian 
approach. The number of endmembers has been estimated as in [18]. More precisely, we retain 
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Fig. 7. Real hyperspectral data: Moffett Field image acquired by AVIRIS in 1997 (left) and the region of interest (right) 
represented in synthetic colors. 



the first R — 1 eigenvalues identified by PCA that capture 95% of the energy contained into 



the dataset. As detailed in III-B.l, we use also PCA to choose the subset V/j_i defined in (11). 



After a short burn-in period A^bi = 50, estimates of the parameters of interest are computing 



following the MMSE principle in ( |40| ) with A^,. = 450. The R = 3 endmembers recovered by 
the proposed joint Bayesian LSMA algorithm are depicted in Fig. [8] (top). These endmember 
spectra are represented in L = 189 spectral bands after removing the water absorption band^ 
These endmembers are characteristic of the coastal area that appears in the image: vegetation, 
water and soil. The corresponding abundance maps, shown Fig. [8] (bottom), are in agreement 
with the previous results presented in [18]. 

VIL Conclusions 

This paper presented a Bayesian model as well as an MCMC algorithm for unsupervised 
unmixing of hyperspectral images, i.e. estimating the endmember spectra in the observed scene 
and their respective abundances for each pixel. Appropriate priors were chosen for the abundance 
vectors to ensure non-negativity and sum-to-one constraints inherent to the linear mixing model. 
Instead of estimating the endmember spectral signatures in the observation space, we proposed to 

''The water vapor absorption bands are usually discarded to avoid poor SNR in these intervals. 
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Fig. 8. Top: the i? = 3 endmember spectra estimated by the algorithm. Bottom: the corresponding abundance maps (black 
(respectively, white) means absence (respectively, presence) of the material). 



estimate their projections onto a suitable subspace. In this subspace, which can be identified by a 
standard dimension reduction technique such as PC A, MNF and HySime, these projections were 
assigned priors that satisfy positivity constraints of the reconstructed endmember spectra. Due to 
the complexity of the posterior distribution, a Gibbs sampling scheme was proposed to generate 
samples asymptotically distributed according to this posterior. The available samples were then 
used to approximate the Bayesian estimators for the different parameters of interest. Results 
of simulations conducted on synthetic and real hyperspectral image illustrated the accuracy 
of the proposed Bayesian method when compared with other algorithms from the literature. 
An interesting open question is whether one can improve performance further by folding the 
intrinsic dimension K of the projection subspace Vk into the Bayesian framework, e.g., by 
applying Bayesian PCA or Bayesian latent variable models. This question is a topic of our 
current research. While this paper introduced a Bayesian method in the context of hyperspectral 
unmixing, the method can also be used for other unmixing applications, such as blind source 
separation, that satisfy positivity and sum-to-one constraints. 
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Appendix 

On the choice of uniform distributions as prior distributions for ar AND THE 
SIZE OF THE SIMPLEX SOLUTION OF THE BSS PROBLEM 

In this appendix, we show that choosing uniform distributions as priors for the abundance 
vectors allows one to favor a posteriori, among two a priori equiprobable polytopes that are 
admissible solutions of the BSS problem, the solution corresponding to the smallest polytope. 



Property: Let M^^^^ and M*^^) be two _R-dimensional convex polytopes of that are admissible 
solutions of the BSS constrained problem, i.e. 



3A(2) 



.(2) 



^1 J ■ ■ ■ J^R 



cA 



R 



1 T 



such as Y = M(i)A(i) = M(2)A(2) where A has been defined in @. Then 

/(mW|y) >/(m(2)|y) 



(46) 



(47) 



vo1(5m(i)) < vo1(>Sm{2)) , 
where vol {S^^ii) ) stands for the volume of the polytope S-^(i) C introduced in ( fTO] ) whose 
vertices are the columns of M^*^ 



Proof: First note that, in absence of noise, as = Ma^, 

Sip ^ U (A) ^ yp\M U (Sm) 

where U {■) stands for the uniform distribution. 
Consequently, 



(48) 



/(Y|M) 



1 



which can be simplified by 



vol {Sm 

/(Y|M): 



p P 



n (y 



PJ 5 



-I p 



vol (Sm) 



(49) 



(50) 
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since, by definition, the observed pixels yp (p = 1, . . . , P) belong to the solution polytope Sm- 
Moreover, Bayes' paradigm allows one to state: 



Since the two solutions M*^^) and M*^^) are a priori equiprobable, from ([50]), it yields: 

/(M«|Y) 



/(M(2)|Y) 



vol (5^(2)) 
vol (5m(i)). 



(52) 



It follows (47). 



Note that the equiprobability assumption underlying the solutions S-^(2) and S-^(2) is not a too 
restrictive hypothesis. Indeed if the variances (r = 1, . . . , i?) had been chosen such that the 



prior distribution in ( [22| ) is sufficiently flat, then: 

/ (M(^)) ^ / (M(2)) . (53) 

Note also that the projection of the polytope S-^(i) onto the subset Vr-i C R^~^ is the simplex 
SrJ^(^) whose vertices are the columns of T^*). 
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